15-3 Comparison Methods

Once we have successfully detect the onset of tapping, we can compute the IOI (inter onset interval) vector for comparing with those of the songs in the database. The IOI vector can be computed as the difference between two successtive oneset times.

Once the similarity (or distance) between two IOI vectors is defined, then we can return the most likely songs in the database according to their similarity/distance to the query input. Before defining possible distance functions, we shall describe the ideal characteristics of the distance function for query by tapping, as follows.

For simplicity, we should use t = [t1, t2, ..., tm] and r = [r1, r2, ..., rn] to denote the test and the reference IOI vectors, respectively, with m The distance functions for IOI vectors can be defined as follows:
  • Simple Lp norm:
    distance=Lp(t(1:m), r(1:m))
    This distance function assume that the query input has the same tempo as that of the intended song in the database. Thus it cannot handle tempo difference.
  • To deal with tempo difference, we can define IOI ratio vector as a vector where each element is the ratio between consecutive elements in the original IOI vector. For instance, the ratio vector of t = [t1, t2, ..., tm] is t' = [t1/t2, t2/t3 ..., tm-1/tm]. Then the distance is
    distance=Lp(t'(1:m-1), r'(1:m-1))
    This distance function assume that the query input has a constant tempo, such that we can take the ratio of consecutive IOI elements and then compared with those of the database songs. However, this distance function cannot deal with insertion and deletion errors. Similar distance functions are listed next:
    distance=minkLp(t(1:m), kr(1:m))
    distance=variance of c(1:m), which c(i)=t(i)/r(i), i=1~m.
  • To be able to deal with tempo difference and insertion/deletion errors, we can apply an alignment algorithm based on dynamic programming. The approach is detailed in this paper and the corresponding slides.
Once we define the distance function, we can apply it to find the distance between the IOI query vector and that of each database songs. The system can then return the ranked list of all database songs. To give a single performance index of a method, we usually use the following figures:
  1. Top-n hit rate: The retrieval is considered successfuy if the intended song appears in the top n of the returned ranked list. Otherwise it is consider failed. Usually we set n equal to 10 since it is easier for a user to browse the returned result of 10 songs. Sometimes we change the value of n to see how the performance varies with the value of n.
  2. Mean reciprocal rank (MRR): If the intended song appears as rank r in the returned ranked list, then it get a score of 1/r. For instance, if the intended song appear at rank 4 of the ranked list, then it receive a score of 1/4 = 0.25.
The following example demonstrates QBT using normalized L1-norm:

Example 1: qbtRetrievalExample01.mfprintf('Reading the song database...\n'); songDb=songDbRead('childSong'); songNum=length(songDb); fprintf('Process the song database...\n'); for i=1:songNum songDb(i).ioi=double(songDb(i).track(2:2:end))/64; % In terms of sec. end fprintf('Process the query file...\n'); waveFile='tapping.wav'; [y, fs, nbits, opts, cueLabel]=wavReadInt(waveFile); au=myAudioRead(waveFile); odPrm=odPrmSet; onset=odByVol(au, odPrm); queryIoi=diff(onset)/fs; % In terms of sec. fprintf('Compare the query with the database...\n'); distance=zeros(songNum, 1); for i=1:songNum distance(i)=ioiDistance(queryIoi, songDb(i).ioi); end [sortedDistance, index]=sort(distance); fprintf('Top-10 results:\n'); for i=1:10 fprintf('%d: songName=%s, distance=%f\n', i, songDb(index(i)).songName, distance(index(i))); endReading the song database... Process the song database... Process the query file... [Warning: WAVREAD will be removed in a future release. Use AUDIOREAD instead.] [> In <a href="matlab:matlab.internal.language.introspective.errorDocCallback('wavread', 'E:\MATLAB\R2015a\toolbox\matlab\audiovideo\wavread.m', 62)" style="font-weight:bold">wavread</a> (<a href="matlab: opentoline('E:\MATLAB\R2015a\toolbox\matlab\audiovideo\wavread.m',62,0)">line 62</a>) In <a href="matlab:matlab.internal.language.introspective.errorDocCallback('wavReadInt', 'd:\users\jang\matlab\toolbox\sap\wavReadInt.m', 30)" style="font-weight:bold">wavReadInt</a> (<a href="matlab: opentoline('d:\users\jang\matlab\toolbox\sap\wavReadInt.m',30,0)">line 30</a>) In <a href="matlab:matlab.internal.language.introspective.errorDocCallback('qbtRetrievalExample01', 'D:\users\jang\books\audioSignalProcessing\example\qbtRetrievalExample01.m', 11)" style="font-weight:bold">qbtRetrievalExample01</a> (<a href="matlab: opentoline('D:\users\jang\books\audioSignalProcessing\example\qbtRetrievalExample01.m',11,0)">line 11</a>) In <a href="matlab:matlab.internal.language.introspective.errorDocCallback('goWriteOutputFile>dummyFunction', 'D:\users\jang\books\goWriteOutputFile.m', 85)" style="font-weight:bold">goWriteOutputFile>dummyFunction</a> (<a href="matlab: opentoline('D:\users\jang\books\goWriteOutputFile.m',85,0)">line 85</a>) In <a href="matlab:matlab.internal.language.introspective.errorDocCallback('goWriteOutputFile', 'D:\users\jang\books\goWriteOutputFile.m', 55)" style="font-weight:bold">goWriteOutputFile</a> (<a href="matlab: opentoline('D:\users\jang\books\goWriteOutputFile.m',55,0)">line 55</a>)] Compare the query with the database... Top-10 results: 1: songName=喔,蘇珊娜_不詳, distance=0.101353 2: songName=歡樂年華_不詳, distance=0.615711 3: songName=潑水歌_不詳, distance=0.629016 4: songName=捕魚歌_不詳, distance=0.638528 5: songName=火車快飛_不詳, distance=0.646215 6: songName=兩隻老虎_不詳, distance=0.655776 7: songName=yankee doodle_不詳, distance=0.660146 8: songName=三輪車_不詳, distance=0.702796 9: songName=魯啦啦_不詳, distance=0.709397 10: songName=小星星_不詳, distance=0.716578

As you can see from the above example, the correct intended song was identified via the use of normalized L1-norm, with a distance of 0.100238.

Task of Query by Tapping at MIREX 2008


Audio Signal Processing and Recognition (音訊處理與辨識)